Skip to content

fix: support large index in forest car#6690

Merged
hanabi1224 merged 15 commits intomainfrom
hm/forest-car-large-index-fix
Mar 6, 2026
Merged

fix: support large index in forest car#6690
hanabi1224 merged 15 commits intomainfrom
hm/forest-car-large-index-fix

Conversation

@hanabi1224
Copy link
Copy Markdown
Contributor

@hanabi1224 hanabi1224 commented Mar 4, 2026

Summary of changes

Changes introduced in this pull request:

  • wrap index in multiple skip frames when size > u32::MAX or FOREST_CAR_INDEX_USE_MULTIPLE_SKIP_FRAMES=1 to keep backward compatibility

Reference issue to close (if applicable)

Closes #6673

Other information and links

Change checklist

  • I have performed a self-review of my own code,
  • I have made corresponding changes to the documentation. All new code adheres to the team's documentation standards,
  • I have added tests that prove my fix is effective or that my feature works (if possible),
  • I have made sure the CHANGELOG is up-to-date. All user-facing changes should be reflected in this document.

Outside contributions

  • I have read and agree to the CONTRIBUTING document.
  • I have read and agree to the AI Policy document. I understand that failure to comply with the guidelines will lead to rejection of the pull request.

Summary by CodeRabbit

  • Bug Fixes

    • Improved error propagation when converting CAR readers to dynamic readers to prevent silent failures.
  • Improvements

    • CAR index serialization now emits Zstd skip-frame framing with multi-frame support for more robust encoding.
    • Index size widened from 32-bit to 64-bit to support larger archives and reporting.
    • Archive display updated to show human-readable byte counts.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Mar 4, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review

Walkthrough

Widened index size types to u64, added a ZstdSkipFramesEncodedDataReader and an async writer path write_zstd_skip_frames_into (supports single/multiple skip-frames), switched encoder/bench to the new writer, and changed several conversions to return Results to propagate errors.

Changes

Cohort / File(s) Summary
Index size widening
src/db/car/any.rs, src/db/car/forest.rs, src/tool/subcommands/archive_cmd.rs
Changed index_size_bytes from u32/Option<u32> to u64/Option<u64>; updated APIs and display formatting to use human_count_bytes.
Skip-frame reader & writer
src/db/car/forest/index/mod.rs, src/db/car/forest.rs, benches/car-index.rs
Added ZstdSkipFramesEncodedDataReader<R> (Size/ReadAt impls) and new async writer write_zstd_skip_frames_into (single/multi-skip-frame emission); encoder and bench switched to this writer.
ForestCar restructuring
src/db/car/forest.rs, src/db/car/forest/index/...
ForestCar.indexed now wraps index with index::ZstdSkipFramesEncodedDataReader<...>; validate/new/into_dyn signatures updated to return index start/length and io::Result; exposed ZSTD_SKIP_FRAME_LEN; accessors use .inner().
Error propagation / API changes
src/db/car/any.rs, src/db/car/many.rs, src/db/car/forest/index/mod.rs
into_dyn() and Reader::map now return io::Result; callers updated to propagate errors (?) where converting readers to dynamic trait objects.
Tests and misc
src/db/car/forest/index/mod.rs (tests), benches/car-index.rs
Tests updated to exercise single- and multi-skip-frame emission and validate header offsets; bench uses write_zstd_skip_frames_into.

Sequence Diagram(s)

sequenceDiagram
    participant Encoder as Encoder
    participant Writer as ZstdSkipFramesWriter
    participant Sink as AsyncWrite
    participant IndexReader as IndexReader

    Encoder->>Writer: construct writer with index data
    Encoder->>Writer: call write_zstd_skip_frames_into(sink)
    Writer->>Sink: write skip-frame header (frame len)
    Writer->>Sink: write frame chunk (<=128KiB)
    alt multi-frame
        Writer->>Sink: write next skip-frame header
        Writer->>Sink: write next frame chunk
    end
    Writer-->>Encoder: return io::Result
    Note right of IndexReader: exposes inner(), Size and ReadAt for reads
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

Suggested reviewers

  • LesnyRumcajs
  • akaladarshi
🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 61.76% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly and concisely describes the main change: supporting large indexes in forest car, which directly addresses the core issue of handling index sizes larger than u32.
Linked Issues check ✅ Passed The PR comprehensively addresses issue #6673 by changing index_size_bytes from u32 to u64 across forest car, index reader/writer, and archive command to support large indexes without panic.
Out of Scope Changes check ✅ Passed All changes are tightly scoped to supporting larger indexes (u32→u64 conversions, ZstdSkipFramesEncodedDataReader wrapper, skip-frame header emission) and directly support the issue objectives.
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch hm/forest-car-large-index-fix

Comment @coderabbitai help to get the list of available commands and usage tips.

@hanabi1224 hanabi1224 force-pushed the hm/forest-car-large-index-fix branch from bd36372 to 708229f Compare March 4, 2026 17:56
@hanabi1224 hanabi1224 marked this pull request as ready for review March 4, 2026 18:39
@hanabi1224 hanabi1224 requested a review from a team as a code owner March 4, 2026 18:39
@hanabi1224 hanabi1224 requested review from LesnyRumcajs and akaladarshi and removed request for a team March 4, 2026 18:39
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🧹 Nitpick comments (1)
src/db/car/forest/index/mod.rs (1)

195-220: Add rustdoc for the new public reader type and methods.

sZstdSkipFramesEncodedDataReader and its public API should be documented for maintainability and correct usage expectations.

As per coding guidelines: "Document all public functions and structs".

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/db/car/forest/index/mod.rs` around lines 195 - 220, Add Rustdoc comments
for the public struct ZstdSkipFramesEncodedDataReader and its public methods
(new, inner, into_inner): document the purpose of the reader, the meaning of
skip_frame_header_offsets, the behavior of new (what it reads from the provided
ReadAt, how offsets are computed, and that it returns io::Result on failure),
what inner returns (borrowed reader) and what into_inner returns (consumes and
returns the inner reader). Keep the docs concise, include parameter/return
descriptions and any error/edge-case notes (e.g., empty input or partial
frames), and add a short usage example or note about thread-safety if
applicable.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/db/car/forest.rs`:
- Around line 151-152: The subtraction using footer-derived offsets (e.g.,
computing index_start_pos = footer.index - ZSTD_SKIP_FRAME_LEN and the similar
subtraction at the other location) must use checked arithmetic to avoid
underflow/wrap; update the code that computes index_start_pos and the related
subtraction at the other occurrence to use checked_sub (or an equivalent) and
map None to an io::Error/InvalidData return (or early Err) with a clear message
indicating corrupted footer-derived offsets; reference footer.index,
ZSTD_SKIP_FRAME_LEN, and the functions/methods that consume index_start_pos so
callers get an InvalidData error instead of panicking or wrapping.

In `@src/db/car/forest/index/mod.rs`:
- Around line 245-271: The read_at implementation currently scans
skip_frame_header_offsets linearly for each call, causing O(n) per read; change
it to a binary search using partition_point (or slice::binary_search_by /
slice::partition_point) on skip_frame_header_offsets to find the count/index of
offsets <= pos in O(log n), compute adjusted_pos = pos + (count as u64) *
ZSTD_SKIP_FRAME_LEN, obtain next_frame_pos from
skip_frame_header_offsets.get(count) if any, and then apply the existing logic
(compute max_read_len, call reader.read_at(adjusted_pos, ...), and recurse into
read_at for the remainder) using the found index instead of iterating. Ensure
types (usize/u64) are converted correctly and preserve the boundary-splitting
behavior when reads cross a skip-frame.
- Around line 482-489: The constant CHUNK_FRAME_DATA_MAX_BYTES is incorrectly
set to 128 * 1024 (128KiB) while the comment and intent describe 128MiB; update
CHUNK_FRAME_DATA_MAX_BYTES to represent 128 * 1024 * 1024 bytes (or 128 << 20)
so write_zstd_skip_frames_into_inner receives the correct chunk size, then keep
the call using self.written_len() and the u32::try_from(...) conversion as-is
(adjusting types if necessary) to avoid producing excessive skip frames.

---

Nitpick comments:
In `@src/db/car/forest/index/mod.rs`:
- Around line 195-220: Add Rustdoc comments for the public struct
ZstdSkipFramesEncodedDataReader and its public methods (new, inner, into_inner):
document the purpose of the reader, the meaning of skip_frame_header_offsets,
the behavior of new (what it reads from the provided ReadAt, how offsets are
computed, and that it returns io::Result on failure), what inner returns
(borrowed reader) and what into_inner returns (consumes and returns the inner
reader). Keep the docs concise, include parameter/return descriptions and any
error/edge-case notes (e.g., empty input or partial frames), and add a short
usage example or note about thread-safety if applicable.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: d3235db5-8ea2-4f57-b14f-470f1e618472

📥 Commits

Reviewing files that changed from the base of the PR and between 4caac9f and eb80020.

📒 Files selected for processing (6)
  • benches/car-index.rs
  • src/db/car/any.rs
  • src/db/car/forest.rs
  • src/db/car/forest/index/mod.rs
  • src/db/car/many.rs
  • src/tool/subcommands/archive_cmd.rs

Comment thread src/db/car/forest.rs Outdated
Comment thread src/db/car/forest/index/mod.rs
Comment thread src/db/car/forest/index/mod.rs Outdated
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

♻️ Duplicate comments (2)
src/db/car/forest/index/mod.rs (2)

245-265: ⚠️ Potential issue | 🟠 Major

read_at still performs linear skip-frame lookup per call.

Line [248] scans skip_frame_header_offsets linearly on every read. With many frames, this becomes a major lookup bottleneck; switch to O(log n) frame mapping (binary search on precomputed logical boundaries).

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/db/car/forest/index/mod.rs` around lines 245 - 265, read_at currently
scans skip_frame_header_offsets linearly every call (see read_at,
skip_frame_header_offsets, ZSTD_SKIP_FRAME_LEN, reader.read_at), causing O(n)
lookup cost; replace that loop with a binary search over a precomputed vector of
logical frame boundaries (e.g., frame_logical_starts or
cumulative_adjusted_offsets computed once when building the structure) to find
the frame index and the adjusted_pos in O(log n), then compute max_read_len from
the next boundary and perform reader.read_at(adjusted_pos, &mut
buf[..max_read_len]) followed by recursive/iterative continuation for the
remainder; ensure correct u64/usize conversions and keep behavior identical to
existing adjustments.

477-485: ⚠️ Potential issue | 🟠 Major

Chunk-size constant does not match the stated 128MiB intent.

Line [479] uses 128 * 1024 (128KiB), which creates far more skip frames than intended and amplifies read-path overhead.

Proposed fix
-        const CHUNK_FRAME_DATA_MAX_BYTES: usize = 128 * 1024;
+        const CHUNK_FRAME_DATA_MAX_BYTES: usize = 128 * 1024 * 1024;
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/db/car/forest/index/mod.rs` around lines 477 - 485, The
CHUNK_FRAME_DATA_MAX_BYTES constant in write_zstd_skip_frames_into is set to 128
* 1024 (128KiB) but the intent is 128MiB; update CHUNK_FRAME_DATA_MAX_BYTES to
128 * 1024 * 1024 (or an equivalent expression like 128 << 20) inside
write_zstd_skip_frames_into so the call to write_zstd_skip_frames_into_inner
uses the correct 128MiB chunk size, leaving the rest of the function (including
the call to write_zstd_skip_frames_into_inner and
u32::try_from(written_len).ok()) unchanged.
🧹 Nitpick comments (1)
src/db/car/forest/index/mod.rs (1)

195-220: Add rustdoc for the new public reader API.

Line [195] introduces a public struct and Lines [201], [214], and [218] expose public methods without documentation. Please add concise docs for expected input format and malformed-data behavior.

As per coding guidelines "Document all public functions and structs".

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/db/car/forest/index/mod.rs` around lines 195 - 220, Add Rustdoc comments
for the public struct ZstdSkipFramesEncodedDataReader and its public methods
new, inner, and into_inner: describe the expected input format (a sequence of
skip-frame headers followed by ZSTD_SKIP_FRAME_LEN + payload length, with
lengths read as little-endian u32 at offset+4), state that new parses the entire
reader to build skip_frame_header_offsets, and document malformed-data behavior
(what happens when read_u32_at fails — e.g., it returns an io::Result error from
new — and that inner/into_inner simply return the wrapped reader without
validation). Keep the docs concise, mention ownership/borrowing semantics for
inner vs into_inner, and reference ZSTD_SKIP_FRAME_LEN for clarity.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In `@src/db/car/forest/index/mod.rs`:
- Around line 245-265: read_at currently scans skip_frame_header_offsets
linearly every call (see read_at, skip_frame_header_offsets,
ZSTD_SKIP_FRAME_LEN, reader.read_at), causing O(n) lookup cost; replace that
loop with a binary search over a precomputed vector of logical frame boundaries
(e.g., frame_logical_starts or cumulative_adjusted_offsets computed once when
building the structure) to find the frame index and the adjusted_pos in O(log
n), then compute max_read_len from the next boundary and perform
reader.read_at(adjusted_pos, &mut buf[..max_read_len]) followed by
recursive/iterative continuation for the remainder; ensure correct u64/usize
conversions and keep behavior identical to existing adjustments.
- Around line 477-485: The CHUNK_FRAME_DATA_MAX_BYTES constant in
write_zstd_skip_frames_into is set to 128 * 1024 (128KiB) but the intent is
128MiB; update CHUNK_FRAME_DATA_MAX_BYTES to 128 * 1024 * 1024 (or an equivalent
expression like 128 << 20) inside write_zstd_skip_frames_into so the call to
write_zstd_skip_frames_into_inner uses the correct 128MiB chunk size, leaving
the rest of the function (including the call to
write_zstd_skip_frames_into_inner and u32::try_from(written_len).ok())
unchanged.

---

Nitpick comments:
In `@src/db/car/forest/index/mod.rs`:
- Around line 195-220: Add Rustdoc comments for the public struct
ZstdSkipFramesEncodedDataReader and its public methods new, inner, and
into_inner: describe the expected input format (a sequence of skip-frame headers
followed by ZSTD_SKIP_FRAME_LEN + payload length, with lengths read as
little-endian u32 at offset+4), state that new parses the entire reader to build
skip_frame_header_offsets, and document malformed-data behavior (what happens
when read_u32_at fails — e.g., it returns an io::Result error from new — and
that inner/into_inner simply return the wrapped reader without validation). Keep
the docs concise, mention ownership/borrowing semantics for inner vs into_inner,
and reference ZSTD_SKIP_FRAME_LEN for clarity.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: 9d79aa75-859e-4040-b22d-457ff296ae51

📥 Commits

Reviewing files that changed from the base of the PR and between eb80020 and 4ad5d1d.

📒 Files selected for processing (1)
  • src/db/car/forest/index/mod.rs

@hanabi1224 hanabi1224 force-pushed the hm/forest-car-large-index-fix branch from e6c8c27 to ce346cb Compare March 4, 2026 18:58
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (2)
src/db/car/forest/index/mod.rs (1)

245-265: ⚠️ Potential issue | 🟠 Major

read_at remains linear per call and can bottleneck large multi-frame indices.

Line 248 iterates all skip-frame offsets, and Line 264 recurses, so repeated reads across many frames can degrade heavily. Please switch to indexed lookup (e.g., binary-search-based frame mapping) to keep lookup cost predictable at scale.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/db/car/forest/index/mod.rs` around lines 245 - 265, The read_at
implementation on read_at is doing a linear scan over skip_frame_header_offsets
and recursive reads, causing O(n) per call and worst-case O(n^2) across many
frames; replace the linear iteration/recursion with an indexed lookup (e.g.,
binary_search on skip_frame_header_offsets) to map pos→frame quickly, compute
adjusted_pos once using the found index and ZSTD_SKIP_FRAME_LEN, then perform a
bounded single reader.read_at call (reader.read_at) for the chunk within that
frame and loop/iterate (not recurse) to continue for remaining bytes—update
read_at to use binary_search (or equivalent) over skip_frame_header_offsets to
find next_frame_pos and avoid per-call linear scans and recursion.
src/db/car/forest.rs (1)

166-166: ⚠️ Potential issue | 🟠 Major

Guard index range computation with checked subtraction.

Line 166 still performs unchecked index_end_pos - index_start_pos. Corrupt footer data can make this underflow; return InvalidData instead of panicking/wrapping.

Proposed fix
-        Ok((header, index_start_pos, index_end_pos - index_start_pos))
+        let index_size_bytes = index_end_pos.checked_sub(index_start_pos).ok_or_else(|| {
+            invalid_data(format!(
+                "invalid footer offsets: index_start_pos({index_start_pos}) > index_end_pos({index_end_pos})"
+            ))
+        })?;
+        Ok((header, index_start_pos, index_size_bytes))
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/db/car/forest.rs` at line 166, The subtraction index_end_pos -
index_start_pos can underflow on corrupt footer data; replace the unchecked
subtraction with a checked subtraction (e.g.,
index_end_pos.checked_sub(index_start_pos)) and if it returns None return an Err
with InvalidData (std::io::Error::new(ErrorKind::InvalidData, ...)) instead of
producing a wrapped/negative value; update the Ok((header, ...)) return to use
the checked difference.
🧹 Nitpick comments (1)
src/db/car/forest/index/mod.rs (1)

195-220: Add rustdoc for the new public reader API.

ZstdSkipFramesEncodedDataReader and its public methods are exported without documentation. Please add concise docs for expected input format, invariants, and failure modes.

As per coding guidelines, Document all public functions and structs.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/db/car/forest/index/mod.rs` around lines 195 - 220, Add Rustdoc comments
for the public ZstdSkipFramesEncodedDataReader struct and its public methods
(new, inner, into_inner): describe the expected input format (sequence of
skip-frame headers followed by frames, how header length is read via read_u32_at
with LittleEndian and the use of ZSTD_SKIP_FRAME_LEN), list invariants
(skip_frame_header_offsets contains offsets discovered during construction,
reader must implement ReadAt, the reader is not validated beyond iterating until
read_u32_at fails), and document failure modes (new returns io::Error if
underlying read_u32_at fails unexpectedly, or stops normally when EOF/invalid
header occurs; callers should handle potential partial/invalid data), plus any
ownership/borrowing behavior (inner returns a reference; into_inner consumes and
returns the inner reader). Ensure doc comments are concise, placed above the
struct and each method, and follow crate doc-style guidelines.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/db/car/forest.rs`:
- Around line 191-201: The indexed remapping in into_dyn loses the original
bounded length by passing None as the slice length; preserve the original index
bound by extracting the index size from the original slice (e.g., call
slice.inner().index_size_bytes() or the appropriate accessor on slice.inner())
and pass Some(index_size_bytes) instead of None to positioned_io::Slice::new
when constructing ZstdSkipFramesEncodedDataReader in ForestCar::into_dyn so the
remapped reader only exposes the index bytes.

---

Duplicate comments:
In `@src/db/car/forest.rs`:
- Line 166: The subtraction index_end_pos - index_start_pos can underflow on
corrupt footer data; replace the unchecked subtraction with a checked
subtraction (e.g., index_end_pos.checked_sub(index_start_pos)) and if it returns
None return an Err with InvalidData (std::io::Error::new(ErrorKind::InvalidData,
...)) instead of producing a wrapped/negative value; update the Ok((header,
...)) return to use the checked difference.

In `@src/db/car/forest/index/mod.rs`:
- Around line 245-265: The read_at implementation on read_at is doing a linear
scan over skip_frame_header_offsets and recursive reads, causing O(n) per call
and worst-case O(n^2) across many frames; replace the linear iteration/recursion
with an indexed lookup (e.g., binary_search on skip_frame_header_offsets) to map
pos→frame quickly, compute adjusted_pos once using the found index and
ZSTD_SKIP_FRAME_LEN, then perform a bounded single reader.read_at call
(reader.read_at) for the chunk within that frame and loop/iterate (not recurse)
to continue for remaining bytes—update read_at to use binary_search (or
equivalent) over skip_frame_header_offsets to find next_frame_pos and avoid
per-call linear scans and recursion.

---

Nitpick comments:
In `@src/db/car/forest/index/mod.rs`:
- Around line 195-220: Add Rustdoc comments for the public
ZstdSkipFramesEncodedDataReader struct and its public methods (new, inner,
into_inner): describe the expected input format (sequence of skip-frame headers
followed by frames, how header length is read via read_u32_at with LittleEndian
and the use of ZSTD_SKIP_FRAME_LEN), list invariants (skip_frame_header_offsets
contains offsets discovered during construction, reader must implement ReadAt,
the reader is not validated beyond iterating until read_u32_at fails), and
document failure modes (new returns io::Error if underlying read_u32_at fails
unexpectedly, or stops normally when EOF/invalid header occurs; callers should
handle potential partial/invalid data), plus any ownership/borrowing behavior
(inner returns a reference; into_inner consumes and returns the inner reader).
Ensure doc comments are concise, placed above the struct and each method, and
follow crate doc-style guidelines.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: b95605f5-6514-4db4-913c-f403adbaebdb

📥 Commits

Reviewing files that changed from the base of the PR and between 4ad5d1d and ce346cb.

📒 Files selected for processing (2)
  • src/db/car/forest.rs
  • src/db/car/forest/index/mod.rs

Comment thread src/db/car/forest.rs
@codecov
Copy link
Copy Markdown

codecov Bot commented Mar 4, 2026

Codecov Report

❌ Patch coverage is 81.21547% with 34 lines in your changes missing coverage. Please review.
✅ Project coverage is 63.38%. Comparing base (6ee5a3f) to head (8e9078c).
⚠️ Report is 3 commits behind head on main.

Files with missing lines Patch % Lines
src/db/car/forest/index/mod.rs 84.66% 8 Missing and 15 partials ⚠️
src/db/car/forest.rs 62.50% 6 Missing and 3 partials ⚠️
src/db/car/any.rs 75.00% 0 Missing and 1 partial ⚠️
src/db/car/many.rs 0.00% 0 Missing and 1 partial ⚠️
Additional details and impacted files
Files with missing lines Coverage Δ
src/tool/subcommands/archive_cmd.rs 29.47% <100.00%> (ø)
src/db/car/any.rs 67.64% <75.00%> (-0.99%) ⬇️
src/db/car/many.rs 66.66% <0.00%> (-0.58%) ⬇️
src/db/car/forest.rs 82.77% <62.50%> (-1.12%) ⬇️
src/db/car/forest/index/mod.rs 87.14% <84.66%> (-1.41%) ⬇️

... and 12 files with indirect coverage changes


Continue to review full report in Codecov by Sentry.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 6ee5a3f...8e9078c. Read the comment docs.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Comment thread src/db/car/forest/index/mod.rs
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

♻️ Duplicate comments (1)
src/db/car/forest.rs (1)

166-166: ⚠️ Potential issue | 🟠 Major

Guard footer-derived index size math with checked_sub.

Line 166 still does unchecked subtraction on on-disk values. Corrupt footer data can underflow and panic/wrap instead of returning InvalidData.

Proposed hardening
-        Ok((header, index_start_pos, index_end_pos - index_start_pos))
+        let index_size_bytes = index_end_pos
+            .checked_sub(index_start_pos)
+            .ok_or_else(|| invalid_data("invalid footer: index start is after index end"))?;
+        Ok((header, index_start_pos, index_size_bytes))
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/db/car/forest.rs` at line 166, The subtraction index_end_pos -
index_start_pos is unchecked and can underflow on corrupt footer data; replace
it with a checked subtraction (use index_end_pos.checked_sub(index_start_pos))
and if it returns None return an Err(io::Error::new(ErrorKind::InvalidData,
...)) instead of panicking—update the return for Ok((header, index_start_pos,
...)) to use the validated size and ensure the error path yields InvalidData;
refer to the symbols header, index_start_pos, and index_end_pos to locate the
change.
🧹 Nitpick comments (1)
src/db/car/forest.rs (1)

179-192: Add rustdoc for public APIs touched in this change.

index_size_bytes and into_dyn are public and currently undocumented.

As per coding guidelines, "Document all public functions and structs".

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/db/car/forest.rs` around lines 179 - 192, Add concise rustdoc comments
for the public methods index_size_bytes and into_dyn (and the public ForestCar
type if not already documented): describe what value index_size_bytes returns
(units and meaning), and explain into_dyn’s behavior and return type (that it
converts the ForestCar into a dynamic boxed RandomAccessFileReader wrapped in
io::Result). Place the docs immediately above the fn signatures
(index_size_bytes, into_dyn) and ensure they follow Rust doc conventions (///)
and mention error conditions for into_dyn if any.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In `@src/db/car/forest.rs`:
- Line 166: The subtraction index_end_pos - index_start_pos is unchecked and can
underflow on corrupt footer data; replace it with a checked subtraction (use
index_end_pos.checked_sub(index_start_pos)) and if it returns None return an
Err(io::Error::new(ErrorKind::InvalidData, ...)) instead of panicking—update the
return for Ok((header, index_start_pos, ...)) to use the validated size and
ensure the error path yields InvalidData; refer to the symbols header,
index_start_pos, and index_end_pos to locate the change.

---

Nitpick comments:
In `@src/db/car/forest.rs`:
- Around line 179-192: Add concise rustdoc comments for the public methods
index_size_bytes and into_dyn (and the public ForestCar type if not already
documented): describe what value index_size_bytes returns (units and meaning),
and explain into_dyn’s behavior and return type (that it converts the ForestCar
into a dynamic boxed RandomAccessFileReader wrapped in io::Result). Place the
docs immediately above the fn signatures (index_size_bytes, into_dyn) and ensure
they follow Rust doc conventions (///) and mention error conditions for into_dyn
if any.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: e1e10bac-0a6b-4bc0-ad90-f27d61769f84

📥 Commits

Reviewing files that changed from the base of the PR and between ce346cb and a351a25.

📒 Files selected for processing (1)
  • src/db/car/forest.rs

Comment thread src/db/car/forest/index/mod.rs Outdated
Comment thread src/db/car/forest/index/mod.rs
@hanabi1224 hanabi1224 requested a review from LesnyRumcajs March 5, 2026 13:07
@hanabi1224 hanabi1224 force-pushed the hm/forest-car-large-index-fix branch from ff6f458 to 8e9078c Compare March 5, 2026 20:59
@hanabi1224 hanabi1224 enabled auto-merge March 6, 2026 08:57
@hanabi1224 hanabi1224 added this pull request to the merge queue Mar 6, 2026
Merged via the queue into main with commit 13cd7c3 Mar 6, 2026
58 of 59 checks passed
@hanabi1224 hanabi1224 deleted the hm/forest-car-large-index-fix branch March 6, 2026 09:27
@coderabbitai coderabbitai Bot mentioned this pull request Apr 2, 2026
6 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

forest-tool archive merge` panics with TryFromIntError on large output

2 participants